Loan Data Exploration by Allan Visochek ## Investigate The following variables Borrower Attributes: >CreditGrade >CreditScoreRangeLower >EmploymentStatus >IsBorrowerHomeOwner >LoanMonthsSinceOrigination >StatedMonthlyIncome >IncomeRange >BorrowerState >Occupation >DebtToIncomeRatio Loan Attributes: >Term >BorrowerRate >LoanOriginalAmount ========================================================

Univariate Plots Section

Loan Info

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1340  0.1840  0.1928  0.2500  0.4975
BorrowerRate Ranges from 0 to .05
##     10% 
## 0.09886
##    90% 
## 0.3099
most borrowers have a rate between .1 and .31

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6500    8337   12000   35000
all loans are between 0 and 35,000…
75% of loans are under 12,000…

All loans have a term of either 1,3 or 5 years.

Borrower Info

##                    Employed     Full-time Not available  Not employed 
##          2255         67322         26355          5347           835 
##         Other     Part-time       Retired Self-employed 
##          3806          1088           795          6134
the large majority of loans go to individuals who are employed full time…

IsBorrowerHomeowner:

## False  True 
## 56459 57478
about half and half
##          AK    AL    AR    AZ    CA    CO    CT    DC    DE    FL    GA 
##  5515   200  1679   855  1901 14717  2210  1627   382   300  6720  5008 
##    HI    IA    ID    IL    IN    KS    KY    LA    MA    MD    ME    MI 
##   409   186   599  5921  2078  1062   983   954  2242  2821   101  3593 
##    MN    MO    MS    MT    NC    ND    NE    NH    NJ    NM    NV    NY 
##  2318  2615   787   330  3084    52   674   551  3097   472  1090  6729 
##    OH    OK    OR    PA    RI    SC    SD    TN    TX    UT    VA    VT 
##  4197   971  1817  2972   435  1122   189  1737  6842   877  3278   207 
##    WA    WI    WV    WY 
##  3048  1842   391   150
pretty even distribution by state population
##                                                        Accountant/CPA 
##                               3588                               3233 
##           Administrative Assistant                            Analyst 
##                               3688                               3602 
##                          Architect                           Attorney 
##                                213                               1046 
##                          Biologist                         Bus Driver 
##                                125                                316 
##                         Car Dealer                            Chemist 
##                                180                                145 
##                      Civil Service                             Clergy 
##                               1457                                196 
##                           Clerical                Computer Programmer 
##                               3164                               4478 
##                       Construction                            Dentist 
##                               1790                                 68 
##                             Doctor                Engineer - Chemical 
##                                494                                225 
##              Engineer - Electrical              Engineer - Mechanical 
##                               1125                               1406 
##                          Executive                            Fireman 
##                               4311                                422 
##                   Flight Attendant                       Food Service 
##                                123                               1123 
##            Food Service Management                          Homemaker 
##                               1239                                120 
##                           Investor                              Judge 
##                                214                                 22 
##                            Laborer                        Landscaping 
##                               1595                                236 
##                 Medical Technician                  Military Enlisted 
##                               1117                               1272 
##                   Military Officer                        Nurse (LPN) 
##                                346                                492 
##                         Nurse (RN)                       Nurse's Aide 
##                               2489                                491 
##                              Other                         Pharmacist 
##                              28617                                257 
##         Pilot - Private/Commercial  Police Officer/Correction Officer 
##                                199                               1578 
##                     Postal Service                          Principal 
##                                627                                312 
##                       Professional                          Professor 
##                              13628                                557 
##                       Psychologist                            Realtor 
##                                145                                543 
##                          Religious                  Retail Management 
##                                124                               2602 
##                 Sales - Commission                     Sales - Retail 
##                               3446                               2797 
##                          Scientist                      Skilled Labor 
##                                372                               2746 
##                      Social Worker         Student - College Freshman 
##                                741                                 41 
## Student - College Graduate Student           Student - College Junior 
##                                245                                112 
##           Student - College Senior        Student - College Sophomore 
##                                188                                 69 
##        Student - Community College         Student - Technical School 
##                                 28                                 16 
##                            Teacher                     Teacher's Aide 
##                               3759                                276 
##              Tradesman - Carpenter            Tradesman - Electrician 
##                                120                                477 
##               Tradesman - Mechanic                Tradesman - Plumber 
##                                951                                102 
##                       Truck Driver                    Waiter/Waitress 
##                               1675                                436

Credit and Ratings

##           A    AA     B     C     D     E    HR 
## 29084 14551  5372 15581 18345 14274  9795  6935
##           A    AA     B     C     D     E    HR    NC 
## 84984  3315  3509  4389  5649  5153  3289  3508   141
A large number of Borrowers either don’t have a prosper rating or a credit score…
lets create new variables to indicate weather a user has a credit grade or a prosper rating
I’m also going to switch the order so that AA comes before A

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0   660.0   680.0   685.6   720.0   880.0     591
Most borrowers have a credit score between 600 and 800

Income

##             $0      $100,000+      $1-24,999 $25,000-49,999 $50,000-74,999 
##            621          17337           7274          32192          31050 
## $75,000-99,999  Not displayed   Not employed 
##          16916           7741            806
few loans are given out to individuals with low income, or who are unemployed, no surprises here…
let’s reorder IncomeRange and create a new variable to indicate weather a borrower is reported as having an income

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3200    4667    5608    6825 1750000

Debt

most debt levels are in the range of 0 to 1
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.220   0.276   0.320  10.010    8554

Quarter

Lets reorder these by time and try again

thats better… we can see here that the number of loans given out dropped dramatically in the last quarter of 2008, ….

Univariate Analysis

What is the structure of your dataset?

There are 113,937 loans in the dataset with 81 features, 13 of which were used in the analysis:

Numerical Variables:

BorrowerRate CreditScoreRangeLower DebtToIncomeRatio LoanOriginalAmount StatedMonthlyIncome

Ordered factor variables:

(from best to worst / greatest to least…)
Term:

60,36,12 >(note that term was a numerical variable but was transformed to a factor variable because it has very few values)

CreditGrade:

AA,A,B,C,D,E,HR,NC,none

ProsperRating..Alpha.:

AA,A,B,C,D,E,HR,none

IncomeRange:

$75,000-99,999 ; $50,000-74,999 ; $1-49,999 1-24,999 ; $0 ; Not employed; Not displayed

Unordered factor variables:

EmploymentStatus:

Employed, Full-time, Not employed, Part-time, Retired, Self-employed none,Not available, Other,

Occupation:

Accountant/CPA, Administrative Assistant, Analyst, Architect, Attorney, Biologist, Bus Driver, Car Dealer, Chemist, Civil Service, Clergy, Computer Programmer, Construction, Dentist, Doctor, Engineer - Chemical, Engineer - Electrical, Engineer - Mechanical, Executive, Fireman, Flight Attendant, Food Service, Food Service Management, Homemaker, Investor, Judge, Laborer, Landscaping, Medical Technician, Military Enlisted, Military Officer, Nurse (LPN), Nurse (RN), Nurse’s Aide, Other, Pharmacist, Pilot - Private/Commercial, Police Officer/Correction Officer, Postal Service, Principal, Professional, Professor, Psychologist, Realtor, Religious, Retail Management, Sales - Commission, Sales - Retail, Scientist, Skilled Labor, Social Worker, Student - College Freshman, Student - College Graduate Student, Student - College Junior, Student - College Senior, Student - College Sophomore, Student - Community College, Student - Technical School, Teacher, Teacher’s Aide, Tradesman - Carpenter, Tradesman - Electrician, Tradesman - Mechanic, Tradesman - Plumber, Truck Driver, Waiter/Waitress

BorrowerState:

AK, AL, AR, AZ, CA, CO, CT, DC, DE, FL, GA, HI, IA, ID, IL, IN, KS, KY, LA, MA, MD, ME, MI, MN, MO, MS, MT, NC, ND, NE, NH, NJ, NM, NV, NY, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VA, VT, WA, WI, WV, WY

Other observations:

The majority of loans go to individuals who are employed full time.

Few loans are given out to individuals with low income, or who are unemployed.

Most Borrowers do not have a credit grade.

Loans amounts range from 0 to $35,000.

75% of loans are for under $12,000.

Nearly all of the loans given out are 100% funded.

Most loans have 0 net principal loss.

Most borrowers have 0 delinquincies in the past 7 years.

Few loans are given out from Q4 2008 through Q2 2009.

What is/are the main feature(s) of interest in your dataset?

The main features of interest are LoanOriginationQuarter and BorrowerRate.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

It is hard to say at this point. All variables mentioned above were selected for the investigation because they are likely to have an impact on the borrower rate.

Did you create any new variables from existing variables in the dataset?

HasCreditGrade <- CreditGrade is available

HasProsperRating <- ProsperRating is available

HasIncome >> IncomeRange is available and greater than 0

Additional variables are created in later sections

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

I log transformed the following variables, which had a left leaning distribution:

DebtToIncomeRatio

StatedMonthlyIncome

I reordered the following factor variables:

ProsperScore..Alpha,

CreditGrade,

Incomerange,

LoanOrginationQuarter

I also subsetted the data for StatedMonthlyIncome by IncomeVerifiable.

Bivariate Plots Section

Overview

## Term: 12
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0400  0.0929  0.1434  0.1501  0.2064  0.2669 
## -------------------------------------------------------- 
## Term: 36
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1274  0.1815  0.1935  0.2599  0.4975 
## -------------------------------------------------------- 
## Term: 60
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0669  0.1490  0.1870  0.1930  0.2319  0.3304

The average Borrower Rate was decreasing significantly from 2012 to 2014

Loan Amount

The average Loan amount went down steeply in Q4 2008 and rose steadily until 2014.

Debt Level

It doesn’t look like there is a significant relationship from this graph.

It looks like there is a trend, although it is still not quite clear.

Debt level quantile and median BorrowerRate are correlated.

Credit score and ratings

Borrowers without a credit grade have borrower rate that is average relative to the others…

Borrower rates are cleanly distributed among the different prosper ratings. (in contrast to the credit grades which overlap…)

Borrowers without a prosper rating have a median borrower rate that is average relative to the others…

Credit score varies much less among different prosper ratings than it does with credit grades…

Income

EmploymentStatus and BorrowerRate

Unemployed borrowers tend to have a significantly higher median that the rest of the other borrowers…

In General, Higher Paying Occupations with a higher level of education (i.e. Judge, Computer Programmer Engineer) have a lower median borrower rate than lower paying occupations with lower levels of education (i.e. Teacher’s Aide, Nurse’s Aide, College Freshman).

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

The average Borrower Rate was decreasing significantly from 2012 to 2014

The average Borrower Rate was decreasing significantly from 2012 to 2014

Debt level quantile and median BorrowerRate are correlated

Borrowers without a credit grade have borrower rate that is average relative to the others…

Borrower rates are cleanly distributed among the different prosper ratings. (in contrast to the credit grades which overlap…)

Borrowers without a prosper rating have a median borrower rate that is average relative to the others…

The borrower rate goes down as income range goes up, the exception being the $0 category.

There is a clean and linear relationship between the StatedMonthlyIncome Quantile and the Median BorrowerRate.

In General, Higher Paying Occupations with a higher level of education (i.e. Judge, Computer Programmer Engineer) have a lower median borrower rate than lower paying occupations with lower levels of education (i.e. Teacher’s Aide, Nurse’s Aide, College Freshman).

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

Credit score varies much less among different prosper ratings than it does with credit grades…

Most borrowers who have an IncomeRange Recorded have a StatedMonthlyIncome that falls within this range, however there are quite a few exceptions.

What was the strongest relationship you found?

The strongest Relationship was between the Borrower Rate and Credit Grade and Borrower Rate and Prosper Rating.

Multivariate Plots Section

Overview

The distribution of borrower rates is less even from 2009 through 2012.

This may have to do with the relatively low number of borrowers during that time period.

The average loan amount is closely correlated with the number of loans given out.

The inverse of the BorrowerRate seems to follow the trends in the number of loans and LoanOriginalAmount by about 2 years.

CreditGrade&Ratings

Credit score by Quarter

CreditGrade/ProsperRating

ok, now I get it… The company used credit grade to determine the borrower rate up until 2009 and then started using their own metric namely, the Prosper Rating

lets make a new variable to separate the the ealier loans from the latter.

Credit Score

## Warning: Removed 569 rows containing missing values (geom_point).

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 315 rows containing missing values (geom_point).
## Warning: Removed 254 rows containing missing values (geom_point).

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 315 rows containing missing values (stat_summary).
## Warning: Removed 254 rows containing missing values (stat_summary).
## Warning: Removed 315 rows containing missing values (geom_point).
## Warning: Removed 254 rows containing missing values (geom_point).

Credit score is correlated to Borrower Rate, although that correlation changes by quarter.

As of 2009, there is a strict cutoff at a credit score of 600.

It looks like there isn’t much variation in Borrower Rate by credit Score for those borrowers with a credit score of less than 600.

I will define bad credit as having a credit score of less than 600 and treat these loans as a separate category.

LoanOriginalAmount

LoanOriginalAmount by quarter

There is at least some relationship between LoanOriginalAmount and BorrowerRate.

All loans in the earlier period are 3 year loans.

LoanOriginalAmount by quarter and credit grade

The correlation between the borrower rate and the loan amount becomes higher as the CreditGrade goes up…

LoanOriginalAmount by quarter and ProsperRating for 3 year loans

It looks like loan original amount was used to determine the BorrowerRate up until 2011.

Lets look at this in a bit more detail…

The correlation between loan amount and borrower rate also goes up as prosper rating goes up.

1 and 5 year loands

1 year loans were only offered as of Q4 2010, so the loan original amount clearly does not effect the borrower rate for any such loans.

the same goes for the 5 year loans…

Income Level

The median borrower rate is related to IncomeRange.

StatedMonthlyIncome

The trend towards lower BorrowerRate for higher StatedMonthlyIncome is consistent accross all of the quarters.

Debt

DebtToIncomeRatio

The trend towards higher BorrowerRate for Higher DebtToIncomeRatio is consistent accross all of the quarters.

AvailableMonthlyIncome

(1-DebtToIncomeRatio)*StatedMonthlyIncome

The quantile Of Available Monthly income is correlated to the median Borrower Rate.

Employment Status

#### Lets avoid this for now…

other borrower stats

state, occupation and homeowner

Synthesis and Model

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

The average loan amount is closely correlated with the number of loans given out.

The inverse of the BorrowerRate seems to follow the trends in the number of loans and LoanOriginalAmount by about 2 years.

Credit score is correlated to Borrower Rate, although that correlation changes by quarter.

As of 2009, there is a strict cutoff at a credit score of 600.

It looks like there isn’t much variation in Borrower Rate by credit Score for those borrowers with a credit score of less than 600.

The correlation between the borrower rate and the loan amount becomes higher as the CreditGrade goes up…

The correlation between the borrower rate and the loan amount becomes higher as the Prosper Rating goes up…

The correlation between loan amount and borrower rate exists until 2011

The median borrower rate is related to IncomeRange.

The trend towards lower BorrowerRate for higher StatedMonthlyIncome is consistent accross all of the quarters.

The quantile Of Available Monthly income is correlated to the median Borrower Rate. ### Were there any interesting or surprising interactions between features?

As obserd in the Credi

All loans in the earlier period are 3 year loans.

1 and 5 year loans are only offered as of 2011.

OPTIONAL: Did you create any models with your dataset? Discuss the strengths and limitations of your model.

Yes… get to this later…


Final Plots and Summary

Plot One

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

Description One

Plot Two

Description Two

Plot Three

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

Description Three

Up until 2009, the BorrowerRate was relatively well corellated to the credit score, or credit grade and the loan original amount. …..

Reflection

This prosper loan data is a rich and sophisticated dataset with multiple inerrelated variables. While I was able to identify trends in the borrower rate